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CROSS-REFERENCE TO RELATED APPLTf^ft TTQM 

This is a continua^.on-in-part of application Serial No. 
07/879,689, filed May 5, ^92, presently pending. The entire 
disclosure of the prior application is relied upon and 
incorporated herein by reference. 

BACKGROUND OF THE INVENTTOM 
This invention relates to a nucleotide sequence that 
encodes the restriction endonuclease I-5ceI. This invention 
also relates to vectors containing the nucleotide sequence, 
cells transformed with the vectors, transgenic animals based 
on the vectors, and cell lines derived from cells in the ani- 
mals. This invention also relates to the use of I-5ceI for 
mapping eukaryotic genomes and for in vivo site directed 
genetic recombination. 

The ability to introduce genes into the germ line of 
mammals is of great interest in biology. The propensity of 
mammalian cells to take up exogenously added DNA and to ex- 
press genes included in the DNA has been known for many 
years. The results of gene manipulation are inherited by the 
offspring of these animals. All cells of these offspring 
inherit the introduced gene as part of their genetic make-up. 
Such animals are said to be transgenic. 

Transgenic mammals have provided a means for studying 
gene regulation during embryogenesis and in differentiation, 
for studying the action of genes, and for studying the intri- 
cate interaction of cells in the immune system. The whole 
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animal is the ultimate assay system for manipulated genes, 
which direct complex biological processes. 

Transgenic animals can provide a general assay for 
functionally dissecting DNA sequences responsible for tissue 
specific or developmental regulation of a variety of genes. 
In addition, transgenic animals provide useful vehicles for 
expressing recombinant proteins and for generating precise 
animal models of human genetic disorders. 

For a general discussion of gene cloning and expression 
in animals and animal cells, see Old and Primrose, "Prin- 
ciples of Gene Manipulation, " Blackwell Scientific Publica- 
tions, London (1989), page 255 et seg. 

Transgenic lines, which have a predisposition to spe- 
cific diseases and genetic disorders, are of great value in 
the investigation of the events leading to these states. It 
is well known that the efficacy of treatment of a genetic 
disorder may be dependent on identification of the gene de- 
fect that is the primary cause of the disorder. The discov- 
ery of effective treatments can be expedited by providing an 
animal model that will lead to the disease or disorder, which 
will enable the study of the efficacy, safety, and mode of 
action of treatment protocols, such as genetic recombination. 

One of the key issues in understanding genetic recombi- 
nation is the nature of the initiation step. Studies of ho- 
mologous recombination in bacteria and fungi have led to the 
proposal of two types of initiation mechanisms. In the first 
model, a single-strand nick initiates strand assimilation and 
branch migration (Meselson and Radding 1975) . Alternatively, 
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a double- strand break may occur, followed by a repair mecha- 
nism that uses an uncleaved homologous sequence as a template 
(ResnicJt and Martin 1976) . This latter model has gained sup- 
port from the fact that integrative transformation in yeast 
is dramatically increased when the transforming plasmid is 
linearized in the region of chromosomal homology (Orr-Weaver, 
Szostak and Rothstein 1981) and from the direct observation 
of a double-strand breaJc during mating type interconversion 
of yeast (Strathern et al. 1982). Recently, double-strand 
breaks have also been characterized during normal yeast mei- 
otic recombination (Sun et al. 1989; Alani, Padmore and 
Kleckner 1990) . 

Several double - strand endonuclease activities have been 
characterized in yeast: HO and intron encoded endonucleases 
are associated with homologous recombination functions, while 
others still have unJaiown genetic functions (Endo-Scel, Endo- 
Scell) (Shibata et al. 1984; Morishima et al. 1990). The HO 
site- specific endonuclease initiates mating- type 
interconversion by making a double -strand break near the YZ 
junction of Mar (Kostriken et al. 1983). The break is subse- 
quently repaired using the intact HML or HMR sequences and 
resulting in ectopic gene conversion. The HO recognition 
site is a degenerate 24 bp non- symmetrical sequence 
(Nickoloff, Chen, and Heffron 1986; Nickoloff , Singer and 
Hef fron 1990) . This sequence has been used as a 
"recombinator" in artificial constructs to promote intra- and 
interroolecular mitotic and meiotic recombination (Nickoloff, 



Chen and Heffron, 1986; Kolodkin, Klar and Stahl 1986; Ray et 
al. 1988, Rudin and Haber, 1988; Rudin, Sugarman, and Haber 
1989) . • 

The two- site specific endonucleases, I-Scel (Jacquier 
and Dujon 1985) and I-Scell (Delahodde et al. 1989; Wenzlau 
et al. 1989), that are responsible for intron mobility in 
mitochondria, initiate a gene conversion that resembles the 
HO-induced conversion (see Dujon 1989 for review) . I-Scel, 
which is encoded by the optional intron Sc LSU.l of the 21S 
rRNA gene, initiates a double- strand break at the intron in- 
sertion site (Macreadie et al. 1985; Dujon et al. 1985; 
Colleaux et al. 1986). The recognition site of I-Scel ex- 
tends over an 18 bp non- symmetrical sequence (Colleaux et al. 
1988) . Although the two proteins are not obviously related 
by their structure (HO is 586 amino acids long while I-Scel 
is 235 amino acids long) , they both generate 4 bp staggered 
cuts with 3 'OH overhangs within their respective recognition 
sites. It has been found that a mitochondrial intron -encoded 
endonuclease, transcribed in the nucleus and translated in 
the cytoplasm, generates a double -strand break at a nuclear 
site. The repair events induced by I-5ceI are identical to 
those initiated by HO. 

In summary, there exists a need in the art for reagents 
and methods for providing transgenic animal models of human 
diseases and genetic disorders. The reagents can be based on 
the restriction enzyme I-Scel and the gene encoding this en- 
zyme. In particular, there exists a need for reagents and 
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methods for replacing a natural gene with another gene the 
is capable of alleviating the disease or genetic disorder. 

SUMMARY OF THE INVENTTOW 

Accordingly, this invention aids in fulfilling these 
needs in the art. S^cifically, this invention relates to an 
isolated DNA encoding the enzyme I- Seel. The DNA has the 
following nucleotide sequence: 
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2671 AAC CTC OCT CCG AAC TCT AAA 
I3NLGPNSK 

2731 ATC GAA CAG TTC GAA GCA GGT 
33IEQrCAC- 

2791 GAT GAA GGT AAA ACC TAC TCT 
53 0EGKTyC 

2851 GTA TCT CTC CTC TAC GAT CAG 
73VCLLYOQ 

2911 CAC CTC GGT AAC CTC GTA ATC 
93Hr.CNLVI 

2971 AAA CTC OCT AAC CTC TTC ATC 
113 K L A N L r I 

3031 AAC TAC CTC ACC CCG ATC TCT 
133 H Y L T P M S 

3091 TAC AAC AAA AAC TCT ACC AAC 
153 Y N K N S T N 

3151 GAA GTA GAA TAC CTC GTT AAC 
173 E V E Y L V K 

3311 ATC AAC AAA AAC AAA CCG ATC 

193 I N K N K P I 

3271 CTC ATC AAA CCG TAC CTC ATC 
213 L I K P Y L I 
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X. A Y H 

AAA TCG ATC GTA CTC 
K S I V L 

GGT CTC CGT AAC AAA 
G L R M K 

ATC TAC ATC GAT TCT 
I Y I D S 

CCG CAC ATC ATG TAC 
P Q H M Y 
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This invention also relates to a DNA sequence comprising 
a promoter operatively linked to the DNA sequence of the in- 
vention 'encoding the enzyme I-Scel. 

This invention further relates to an isolated RNA 
complementary to the DNA sequence of the invention encoding 
the enzyme I-Scel and to the other DNA sequences described 
herein . 

In another embodiment of the invention, a vector is 
provided. The vector comprises a plasmid, bacteriophage, or 
cosmid vector containing the DNA sequence of the invention 
encoding the enzyme I-5ceI. 

In addition, this invention relates to E. coli or 
eukaryotic cells transformed with a vector of the invention. 

Also, this invention relates to transgenic animals con- 
taining the DNA sequence encoding the enzyme I-Scel and cell 
lines cultured from cells of the transgenic animals. 

In addition, this invention relates to a transgenic 
organism in which at least one restriction site for the 
enzyme I-Scel has been inserted in a chromosome of the 
organism. 

Further, this invention relates to a method of 
genetically mapping a eukaryotic genome using the enzyme I- 
5cel . 

This invention also relates to a method for in vivo site 
directed recombination in an organism using the enzyme I- 
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BRIEF DE SCRIPTION OF THE DRAWINGS 
This invention will be more fully described with refer- 
ence to'* the drawings in which: 

Fig. 1 depicts the universal code equivalent of the 
mitochondrial I - Sdel . gene . 

Fig. 2 depict^ the nucleotide sequence of the invention 
encoding the enzyme\ I - Scel and the amino acid sequence of the 
natural I-5ceI enz^ 

Fig. 3 depicts t\he I-5ceI recognition sequence and 
indicates possible ba^ mutations in the recognition site and 
the effect of such muffiions on stringency of recognition. 

Fig. 4 is the nucleotide sequence and deduced amino acid 
sequence of a region of Wasmid pSCMsis. The nucleotide 
sequence of the invent ioA encoding the enzyme I -Scel is en- 
closed in the box. 

Fig. 5 depicts variat^ions around the amino acid sequence 
of the enzyme I -Scel 

Fig. 6 shows Group I i^tron encoding endonucleases and 
related endonucleases 

Fig. 7 depicts yeast expression vectors containing the 
synthetic gene for I -Scel, 

Fig. 8 depicts the mammaMan expression vector PRSV I- 

Scel. 

Fig. 9 is a restriction mat of the plasmid pAFlOO. {See 
also YEAST, 6:521-534, 1990, which is relied upon and 
incorporated by reference hereim 

Figs. IOA and lOB show the ijucleotide sequence and re- 
striction sites of regions of the\ plasmid pAFlOO. 
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Fig. 11 depicts an insertion vector pTSMw, pTKMoj, and 
pTTcw containing the I-Scel site for B. coli and other 
bacteria. 

Fig. 12 depicts an insertion vector pTYW6 containing the 
I-Scel site for yeast. 

Fig. 13 depicts an insertion vector PMLV LTR SAPLZ 
containing the I-Scel site for mammalian cells. 

Fig. 14 depicts a set of seven transgenic yeast strains 
cleaved by I-Scel. Chromosomes from FY1679 (control) and 
from seven transgenic yeast strains with I-5ceI sites 
inserted at various positions along chromosome XI were 
treated with I-5ceI. DNA was electrophoresed on 1% agarose 
(SeaKem) gel in 0.25 X TBE buffer at 130 V and 12<»C on a 
Rotaphor apparatus (Biometra) for 70 hrs using lOO sec to 
40 sec decreasing pulse times. (A) DNA was stained with 
ethidium bromide {0.2;ig/ml) and transferred to a Hybond N 
© (Amersham) membrane for hybridization. (B)^^P labelled 

cosmid PUKG040 which hybridizes with the shortest fragment of 
the set was used as a probe. Positions of chromosome XI and 
shorter chromosomes are indicated. 

Fig. 15 depicts the rationale of the nested chromosomal 
fragmentation strategy for genetic mapping. (A) Positions of 
J-5ceI sites are placed on the map, irrespective of the left/ 
right orientation (shorter fragments are arbitrarily placed 
on the left) . Fragment sizes as measured from PFGE 
?A-LmcARKETr ^^^5. 14A) are indicated m kb (note that the sum of the two 

S Dl-NS'ER. 

.:;!!-;.V:v; roo^o, fragment sizes varies slightly due to the limit of precision 

of each measurement) . (B) Hybridization with the probe that 
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hybridizes the shortest fragment of the set determines the 
orientation of each fragment (see Fig. 14B) . Fragments that 
hybridize with the probe (full lines) have been placed 
arbitrarily to the left. (C) Transgenic yeast strains have 
been ordered with increasing sizes of hybridizing chromosome 
fragments. (D) Deduced J-5ceI map with minimal and maximal 
size of intervals indicated in kb (variations in some 
intervals are due to limitations of PFGE measurements) . (E) 
Chromosome subfragments are used as probes to assign each 
cosmid clone to a given map interval or across a given J-Scel 
site. 

Fig. 16 depicts mapping of the J-Scel sites of 
transgenic yeast strains by hybridization with left end and 
right end probes of chromosome XI. Chromosomes from FY1679 
(control) and the seven transgenic yeast strains were 
digested with J-5ceI. Transgenic strains were placed in 
order as explained in Fig. 15. Electrophoresis conditions 
were as in Fig. 14. P labelled cosmids pUKG040 and pUKG066 
were used as left end and right end probes, respectively. 

Fig. 17 depicts mapping of a cosmid collection using the 
nested chromosomal fragments as probes. Cosmid DNAs were 
digested with EcoRI and electrophoresed on 0.9% agarose 
(SeaKem) gel at 1.5 V/cm for 14 hrs, stained with ethidium 
bromide and transferred to a Hybond N membrane. Cosmids were 
placed in order from previous hybridizations to help 
visualize the strategy. Hybridizations were carried out 
serially on three identical membranes using left end nested 
chromosome fragments purified on PFGE (see Fig 16) as 




probes. A: ethidixjm bromide staining (ladder is the BRL "ikb 
ladder"), B: membrane #1, probe: Left tel to A302 site, 
C: membrane #1, probe: Left tel to M57 site, D: membrane #2, 
probe: Left tel to H81 site, E: membrane #2, probe: Left tel 
to T62 site, P: membrane #3, probe: Left tel to G41 site, G: 
membrane #3, probe: Left tel to D304 site, H: membrane #3, 
probe: entire chromosome XI. 

Fig. 18 depicts a map of the yeast chromosome XI as 
determined from the nested chromosomal fragmentation 
strategy. The chromosome is divided into eight intervals 
(with sizes indicated in kb, see Fig. 15D) separated by seven" 
J-5ceI sites (E40, A302 ...). Cosmid clones falling either 
within intervals or across a given" i-Scel site are listed 
below intervals or below interval boundaries, respectively. 
Cosmid clones that hybridize with selected genes used as 
probes are indicated by letters (a-i) . They localize the 
gene with respect to the I-Scel map and allow comparison with 
the genetic map (top) . 

Fig. 19 depicts diagrams of successful site directed 
homologous recombination experiments performed in yeast. 

DETAILED DE SCRIPTION OF THE PREFERRED EMBODIMENTS 

The genuine mitochondrial gene (ref. 8) cannot be ex- 
pressed in E. coli, yeast or other organisms due to the pecu- 
liarities of the mitochondrial genetic code. A "universal 
code equivalent" has been constructed by in vitro site- 
directed mutagenesis. Its sequence is given in Fig. i. Note 
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that all non-universal codons (except two CTN) have been re- 
placed together with some codons extremely rare in E. coll. 

The universal code equivalent has been successfully- 
expressed in E. coli and determines the synthesis of an ac- 
tive enzyme. However, expression levels remained low due to 
the large number of codons that are extremely rare in E. 
coli. Expression of the "universal code equivalent" has been 
detected in yeast. 

To optimize gene expression in heterologous systems, a 
synthetic gene has been designed to encode a protein with the 
J genuine amino acid sequence of I-5ceI using, for each codon, 

that most frequently used in E. coli. The sequence of the 
synthetic gene is given in Fig. 2. The synthetic gene was 
constructed in vitro from eight synthetic oligonucleotides 
with partial overlaps. Oligonucleotides were designed to 
allow mutual priming for second strand synthesis by Klenow 
polymerase when annealed by pairs. The elongated pairs were 
then ligated into plasmids. Appropriately placed restriction 
sites within the designed sequence allowed final assembly of 
the synthetic gene by in vitro ligation. The synthetic gene 
has been successfully expressed in both E. coli and yeast. 

1 . I-Scel Gene Sequence 

This invention relates to an isolated DNA sequence 
encoding the enzyme I-5ceI. The enzyme I-5ceI is an 

'^l^to^^^^^ endonuclease. The properties of the enzyme (ref . 14) are as 

OCO 1 5"tzT H w follows : 
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I-5ceI is a double -stranded endonuclease that cleaves 
DNA within its recognition site. I-Scel generates a 4bp 
staggered cut with 3 'OH overhangs. 

Substrate: Acts only on double- stranded DNA. Substrate 
DNA can be relaxed or negatively supercoiled. 

Cations: Enzymatic activity requires Mg*"^ (8 mM is 
optimum) . Mn*"" can replace Mg**, but this reduces the 
stringency of recognition. 

Optimum conditions for activity: . high pH (9 to 10) , 
temperature 20-40"'C, no monovalent cations. 

Enzyme stability: I-Scel is unstable at room tempera- 
ture. The enzyme -substrate complex is more stable than 
the enzyme alone (presence of recognition sites stabi- 
lizes the enzyme.) 



The enzyme I-Scel his a known recognition site. (ref. 
14.) The recognition sitdVof I-Scel is a non- symmetrical 
sequence that extends over \8 bp as determined by systematic 
mutational analysis. The sec^^^ence reads: (arrows indicate 
cuts) 
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The- recognition site corresponds, in part, to the upstream 
exon and, in part, to the dovmstream exon of the intron plus 
form of 'the gene. 

The recognition site is partially degenerate: single 
base substitutions within the 18 bp long sequence result in 
either complete insensitivity or reduced sensitivity to the 
enzyme, depending upon position and nature of the substitu- 
tion. 

The stringency of recognition has been measured on: 
-1- mutants of the site. 

-2- the total yeast genome iSaccharomyces 
cerevisiae, genome complexity is 1.4 x 10*' bp). Data 
are unpublished. 
Results are: 

-1- Mutants of the site: As shown in Fig. 3, 
there is a general shifting of stringency, i.e., mutants 
severely affected in Mg*"" become partially affected in 
1^ Mn**, mutants partially affected in Mg** become 

unaffected in Mn**. 

-2- Yeast: In magnesium conditions, no cleavage is 
observed in normal yeast. In the same condition, DNA 
from transgenic yeasts is cleaved to completion at the 
artificially inserted I-5ceI site and no other cleavage 
site can be detected. If magnesium is replaced by 
manganese, five additional cleavage sites are revealed 

ISNECAN. HENDERSON' t n 4 
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• an average of l site for ca. 3 millions based pairs (5/ 
1.4 X lo'' bp) . 

Pefinitipn of the reroanition important 
bases are indicated in Fig. 3. They correspond to bases 
for which severely affected mutants exist. Notice 
however that: 

-1- All possible mutations at each position have 
not been determined; therefore a base that does not 
correspond to a severely affected mutant may still be 
important if another mutant was examined at this very 
^5 same position. 

-2- There is no clear-cut limit between a very 
important base (all mutants are severely affected) and a 
moderately important base (some of the mutants are 
severely affected) . There is a continuum between ex- 
cellent substrates and poor substrates for the enzyme. 

The expected frequency of natural I-5ceI sites in a 
random DNA sequence is, therefore, equal to (0.25)''^^ or 
(1.5 X 10 ■^■^) . In other words, one should expect one 
natural site for the equivalent of ca. 20 human genomes, 
but the frequency of degenerate sites is more difficult 
to predict. 

I-Scel belongs to a "degenerate" subfamily of the 
two-dodecapeptide family. Conserved amino acids of the 
dodecapeptide motifs are required for activity, in 
fa^^ot^cax^ett'^ particular, the aspartic residues at positions 9 of the 

two dodecapeptides cannot be replaced, even with 
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■ glutamic residues. It is likely that the dodecapeptides 
form the catalytic site or part of it. 

Consistent with the recognition site being non- 
symmetrical, it is likely that the endonucleolytic ac- 
tivity of I-Scel requires two successive recognition 
steps: binding of the enzyme to the downstream half of 
the site {corresponding to the downstream exon) followed 
by binding of the enzyme to the upstream half of the 
site (corresponding to the upstream exon) . The first 
binding is strong, the second is weaker, but the two are 
necessary for cleavage of DNA. In vitro, the enzyme can 
bind the downstream exon alone as well as the intron- 
exon junction sequence, but no cleavage results. 
The evolutionarily conserved dodecapeptide motifs of 
intron- encoded I-Scel are essential for endonuclease 
activity. It has been proposed that the role of these motifs 
is to properly position the acidic amino acids with respect 
to the DNA sequence recognition domains of the enzyme for the 
catalysis of phosphodiester bond hydrolysis (ref. P3) . 

The nucleotide sequence of the invention, which encodes 
the natural I-5ceI enzyme is shown in Fig. 2. The nucleotide 
sequence of the gene of the invention was derived by 
dideoxynucleotide sequencing. The base sequences of the 
nucleotides are written in the 5' >3' direction. Each of 
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the letters shown is a conventional designation for the fol- 
lowing nucleotides: 

A Adenine 
G Guanine 
T Thymine 
C Cytosine. 
It is preferred that the DNA sequence encoding the en- 
zyme I-Scel be in a purified form. For instance, the se- 
quence can be free of human blood- derived proteins, human se- 
rum proteins, viral proteins, nucleotide sequences encoding 
these proteins, human tissue, human tissue components, or 
combinations of these substances. In addition,* it is pre- 
ferred that the DNA sequence of the invention is free of ex- 
traneous proteins and lipids, and adventitious microorgan- 
isms, such as bacteria and viruses. The' essentially purified 
and isolated DNA sequence encoding I-5ceI is especially use- 
ful for preparing expression vectors. 

Plasmid PSCM525 is a pUC12 derivative, containing an 
artificial sequence encoding the DNA sequence of the inven- 
tion. The nucleotide sequence and deduced amino acid 
sequence of a region of plasmid pSCM525 is shown in Fig. 4. 
The nucleotide sequence of the invention encoding I-Scel is 
enclosed in the box. The artificial gene is a BamHI - 5aII 
piece of DNA sequence of 723 base pairs, chemically 
synthesized and assembled. It is placed under tac promoter 
control. The DNA sequence of the artificial gene differs 
from the natural coding sequence or its universal code 
equivalent described in Cell (1986), Vol. 44, pages 521-533 

\1 
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However, the translation product of the artificial gene is 
identical in sequence to the genuine omega- endonucl ease 
except for the addition of a Met-His at the N- terminus. It 
will be understood that this modified endonuclease is within 
the scope of this invention. 

Plasmid pSCM525 can be used to transform any suitable E. 
coli strain and transformed cells become ampicillin- 
resistant. Synthesis of the omega -endonucl ease is obtained 
by addition of I.P.T.G. or an equivalent inducer of the 
lactose operon system. 

A plasmid identified as pSCMS25 containing the enzyme I- 
Scel was deposited in E. coli strain TGI with the Collection 

P Nationale de Cultures de Microorganismes (C.N. CM. ) of 

IP 

J Institut Pasteur in Paris, France on November 22, 1990, under 

culture collection deposit Accession No. 1-1014. The nucle- 
otide sequence of the invention is thus available from this 
deposit . 

The gene of the invention can also be prepared by the 

formation of 3' >5' phosphate linkages between nucleoside 

units using conventional chemical synthesis techniques. For 
example, the well-known phosphodiester, phosphotriester, and 
phosphite triester techniques, as well as known modifications 
of these approaches, can be employed. Deoxyribonucleotides 
can be prepared with automatic synthesis machines, such as 
those based on the phosphoramidite approach. Oligo- and 

^Afc-^flc^x*. Garrett 

polyribonucleotides can also be obtained with the aid of RNA 

S Dl-nser. 

I s—iiTT. N w ligase using conventional techniques. 
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This invention of course includes variants of the DNA 
sequence of the invention exhibiting substantially the same 
properties as the sequence of the invention. By this it is 
meant that DNA sequences need not be identical to the se- 
quence disclosed herein. Variations can be attributable to 
single or multiple base substitutions, deletions, or inser- 
tions or local mutations involving one or more nucleotides 
not substantially detracting from the properties of the DNA 
sequence as encoding an enzyme having the cleavage properties 
of the enzyme I-Scel. 

Fig. 5 depicts some of the variations that can be made 
around the I-5ceI amino acid sequence. It has been demon- 
strated that the following positions can be changed without 
affecting enzyme activity: 

positions -1 and -2 are not natural. The two amino 

acids are added due to cloning strategies, 
positions 1 to 10: can be deleted. 



!i position 36 



S is tolerated 
M or V are tolerated. 
S or N are tolerated. 
A is tolerated. 
Y or N are tolerated. 
A is tolerated, 
positions 123 and 156: L is tolerated, 
position 223: A and S are tolerated. 



position 40 
position 41 
position 43 
position 46 
position 91 



It will be understood that enzymes containing these modifica' 
tions are within the scope of this invention. 

Changes to the amino acid sequence in Fig. 5 that have 
been demonstrated to affect enzyme activity are as follows: 



position 


19: 


Ij 


to 






position 


38: 


I 


to 


S 


or N 


position 


39: 


s 


to 


D 


or R 


position 


40: 


L 


to 


Q 


position 


42: 


L 


to 


R 
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• position 44: D to E, G or g 

position 45: ^ to S or 2 

position 46: Y to S 

position 47: I to g or N 

position 80: L to S 

position 144: D to S 

position 145: D to E 

position 146: fi to E 

position 147: G to S 

It wiil also be understood that the present invention is 
intended to encompass fragments of the DNA sequence of the 
invention in purified form, where the fragments are capable 
of encoding enzymatically active I-Scel. 

The DNA sequence of the invention coding for the enzyme I- 
5cel can be amplified in the well known polymerase chain 
reaction (PGR), which is useful for amplifying all or spe- 
cific regions of the gene. See e^, s. Kwok et al., j. 
S Virol., 61:1690-1694 (1987); U.S. Patent 4,683,202; and U.S. 

Patent 4,683,195. More particularly, DNA primer pairs of 
known sequence positioned 10-300 base pairs apart that are 
complementary to the plus and minus strands of the DNA to be 
amplified can be prepared by well known techniques for the 
synthesis of oligonucleotides. One end of each primer can be 
extended and modified to create restriction endonuclease 
sites when the primer is annealed to the DNA. The PGR reac- 
tion mixture can contain the DNA, the DNA primer pairs, four 
deoxyribonucleoside triphosphates, MgCl^, DNA polymerase, and 
conventional buffers. The DNA can be amplified for a number 
of cycles, it is generally possible to increas4 the sensi- 
FAi^^BCM-. Garrett '^ivity of detection by using a multiplicity of cycles each 

cycle consisting of a short period of denaturation of the DNA 
at an elevated temperature, cooling of the reaction mixture. 
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and- polymerization with the DNA polymerase. Amplified se- 
quences can be detected by the use of a technique termed oli- 
gomer restriction (OR). s^S, R. K. Saiki et al., Bio/ 
Technology 3:1008-1012 (1985). 

The enzyme I-5ceI is one of a number of endonucleases with 

similar properties. Following is a listing of related 

enzymes and their sources. 

Group I intron encoded endonucleases and related enzymes 

are listed below with references. Recognition sites are 

shown in Fig, 6. 



Enzyme 



Encoded by 



Ref 



l-5cel 
I -Seen 

I-5ceIII 

I-5ceIV 

I-Ceul 

I-Crel 
I-Ppol 

I-TevI 



Sc LSU-1 intron 
Sc coxl-4 intron 

Sc coxl-3 intron 

Sc coxl-5a intron 

Ce LSU-5 intron 

Cr LSU-l intron 
Pp LSU-3 intron 

T4 td-1 intron 



this wor)c 

Sargueil et al., NAR 

(1990) 18, 5659-5665 
Sargueil et al . , MGG 

(1991) 225, 340-341 
Seraphin et al. (1992) 
in press 

Marshall, Lemieux Gene 
(1991) 104, 241-245 
Rochaix (unpublished) 
Muscarella et al . , MCB 
(1990) 10, 3386-3396 
Chu et al. , PNAS (1990) 
87, 3574-3578 and Bell- 
Pedersen et al. NAR 
(1990) 18, 3763-3770. 
Bell-Pedersen et al. NAR 

(1990) 18, 3763-3770. 
Eddy, Gold, Genes Dev. 

(1991) 5, 1032-1041 
Nickolof f et al . , MCB 

(1990) 10, 1174-1179 
Kawasaki et al . , JBC 

(1991) 266, 5342-5347 ' 

Putative new enzymes (genetic evidence but no activity 
as yet) are I-Csjnl from cytochrome b intron 1 of CMamyd'o/no- 
nas smithii mitochondria (ref. 15), I-PanI from cytochrome b 
intron 3 of Podospora anserina mitochondria (Jill Salvo) , and 



l-revll 

I-revIIl 

HO 

Endo 5cel 



T4 sunY intron 
RB3 nrdB-l intron 
HO yeast gene 
RF3 yeast mi to. gene 
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probably enzymes encoded by introns Nc ndl'l and Nc cob'! 
from Neurospora crassa. 

The I-endonucleases can be classified as follows: 




IB 
13 



Class^: Two dodecapeptide motifs, 4 bp staggered cut with 
3' OH overhangs, cut internal to recognition site 

Subclass "I-Sgel" Other gubclaaaea 



l-5cel 
I-ScelV 
t'Csml 
1-Panl 



I-Scell 
1-Scelll 

1-Ceul (only one dodecapeptide motif) 
I-Crel (only one dodecapeptide motif) 

HO • 

TFPl-408 (HO homolog) 
Endo 5cel 

Cla§s_II: GIY-(N^Q.3^^) YIG motif, 2 bp staggered cut with 3' 
OH overhangs, cut external to recognition site: 
I-TevI 

Class III ; no typical structural motifs, 4 bp staggered cut 
with 3' OH overhangs, cut internal to recognition site: 
I-Ppol 

Class IV ; no typical structural motifs, 2 bp staggered cut 
with 3' OH overhangs, cut external to recognition site; 
I-TevII 

Clasg V ; no typical structural motifs, 2 bp staggered cut 
with 5' OH overhangs: 
I-Tevlll. 
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2. ■ Nucleotide Probes Containing the I-Scel 
Gene of The Invention 

The DNA sequence of the invention coding for the enzyme I- 
5cel can also be used as a probe for the detection of a 
nucleotide sequence in a biological material, such as tissue 
or body fluids. The probe can be labeled with an atom or 
inorganic radical, most commonly using a radionuclide, but 

also perhaps with a heavy metal. Radioactive labels include 

32 3 14 

P, H, C, or the like. Any radioactive label can be em- 
ployed, which provides for an adequate signal and has suf- 
ficient half -life. Other labels include ligands that can 
serve as a specific binding member to a labeled antibody, 
fluorescers, chemiluminescers, enzymes, antibodies which can 
serve as a specific binding pair member for a labeled ligand, 
and the like. The choice of the label will be governed by 
the effect of the label on the rate of hybridization and 
binding of the probe to the DNA or RNA. it will be necessary 
that the label provide sufficient sensitivity to detect the 
amount of DNA or RNA available for hybridization. 

When the nucleotide sequence of the invention is used as a 
probe for hybridizing to a gene, the nucleotide sequence is 
preferably affixed to a water insoluble solid, porous sup- 
port, such as nitrocellulose paper. Hybridization can be 
carried out using labeled polynucleotides of the invention 
and conventional hybridization reagents. The particular hy- 
bridization technique is not essential to the invention. 

The amount of labeled probe present in the hybridization 
solution will vary widely, depending upon the nature of the 
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label, the amount of the labeled probe which can reasonably 
bind to the support, and the stringency of the hybridization. 
Generally, substantial excesses of the probe over 
stoichiometric will be employed to enhance the rate of bind- 
ing of the probe to the fixed DNA. 

Various degrees of stringency of hybridization can be 
employed. The more severe the conditions, the greater the 
complementarity that is required for hybridization between 
the probe and the polynucleotide for duplex formation. Se- 
verity can be controlled by temperature, probe concentration, 
probe length, ionic strength, time, and the like. Conve- 
niently, the stringency of hybridization is varied by chang- 
ing the polarity of the reactant solution. Temperatures to 
be employed can be empirically determined or determined from 
well known formulas developed for this purpose. 

3 . Nucleotide Sequences Containing the 
Nucleotid e Sequence Encoding I-Sr^T 

This invention also relates to the DNA sequence of the 
invention encoding the enzyme I-Scel, wherein the nucleotide 
sequence is linked to other nucleic acids. The nucleic acid 
can be obtained from any source, for example, from plasmids, 
from cloned DNA or RNA, or from natural DNA or RNA from any 
source, including prokaryotic and eukaryotic organisms. DNA 
or RNA can be extracted from a biological material, such as 
biological fluids or tissue, by a variety of techniques in- 
cluding those described by Maniatis et al.. Molecular Clnn- 
inaj — A Laboratory Manual. Cold Spring Harbor Laboratory, New 
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York (1982). The nucleic acid will generally be obtained 
from a bacteria, yeast, virus, or a higher organism, such as 
a planfor animal. The nucleic acid can be a fraction of a 
more complex mixture, such as a portion of a gene contained 
in whole human DNA or a portion of a nucleic acid sequence of 
a particular microorganism. The nucleic acid can be a frac- 
tion of a larger molecule or the nucleic acid can constitute 
an entire gene or assembly of genes. The DNA can be in a 
single- stranded or double -stranded form, if the fragment is 
in single- stranded form, it can be converted to double- 
stranded form using DNA polymerase according to conventional 
techniques. 

The DNA sequence of the invention can be linked to a 
structural gene. As used herein, the term "structural gene- 
refers to a DNA sequence that encodes through its template or 
messenger mRNA a sequence of amino acids characteristic of a 
specific protein or polypeptide. The nucleotide sequence of 
the invention can function with an expression control se- 
quence, that is, a DNA sequence that controls and regulates 
expression of the gene when operatively linked to the gene. 

4. Vectors Containing the Nucleotide 
Sequenc e of thf^ Inven^^nn 

This invention also relates to cloning and expression 
vectors containing the DNA sequence of the invention coding 
for the enzyme I-5ceI. 

More particularly, the DNA sequence encoding the enzyme 
can be ligated to a vehicle for cloning the sequence. The 
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major steps involved in gene cloning comprise procedures for 
separating DNA containing the gene of interest from prokary- 
otes or'eukaryotes, cutting the resulting DNA fragment and 
the DNA from a cloning vehicle at specific sites, mixing the 
two DNA fragments together, and ligating the fragments to 
yield a recombinant DNA molecule. The recombinant molecule 
can then be transferred into a host cell, and the cells al- 
lowed to replicate to produce identical cells containing 
clones of the original DNA sequence. 

The vehicle employed in this invention can be any 
double -stranded DNA molecule capable of transporting the 
nucleotide sequence of the invention into a host cell and 
capable of replicating within the tell. More particularly, 
the vehicle must contain at least one DNA sequence that can 
act as the origin of replication in the host cell. In addi- 
tion, the vehicle must contain two or more sites for inser- 
tion of the DNA sequence encoding the gene of the invention. 
These sites will ordinarily correspond to restriction enzyme 
sites at which cohesive ends can be formed, and which are 
complementary to the cohesive ends on the promoter sequence 
to be ligated to the vehicle. In general, this invention can 



be carried out with plasmid, bacteriophage, or cosmid 
vehicles having these characteristics. 

The nucleotide sequence of the invention can have cohe- 
sive ends compatible with any combination of sites in the 
vehicle. Alternatively, the sequence can have one or more 
blunt ends that can be ligated to corresponding blunt ends in 
the cloning sites of the vehicle. The nucleotide sequence to 
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be ligated can be further processed, if desired, by succes- 
sive exonuclease deletion, such as with the enzyme Bal 31. 
In the event that the nucleotide sequence of the invention 
does not contain a desired combination of cohesive ends, the 
sequence can be modified by adding a linker, an adaptor, or 
homopolymer tailing. 

It is preferred that plasmids used for cloning nucle- 
otide sequences of the invention carry one or more genes re- 
sponsible for a useful characteristic, such as a selectable 
marker, displayed by the host cell. In a preferred strategy, 
plasmids having genes for resistance to two different drugs 
are chosen. For example, insertion of the DNA sequence into 
a gene for an antibiotic inactivates the gene and destroys 
drug resistance. The second drug resistance gene is not af- 
fected when cells are transformed with the recombinants, and 
J colonies containing the gene of interest can be selected by 

resistance to the second drug and susceptibility to the first 
drug. Preferred antibiotic markers are genes imparting 
chloramphenicol, ampicillin, or tetracycline resistance to 
the host cell. 

A variety of restriction enzymes can be used to cut the 
vehicle. The identity of the restriction enzyme will gener- 
ally depend upon the identity of the ends on the DNA sequence 
to be ligated and the restriction sites in the vehicle. The 
restriction enzyme is matched to the restriction sites in the 
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• The ligation reaction can be set up using well known 
techniques and conventional reagents. Ligation is carried 
out with a DNA ligase that catalyzes the formation of 
phosphodiester bonds between adjacent 5' -phosphate and the 
free 3 '-hydroxy groups in DNA duplexes. The DNA ligase can 
be derived from a variety of microorganisms. The preferred 
DNA ligases are enzymes from E. coli and bacteriophage T4. 
T4 DNA ligase can ligate DNA fragments with blunt or sticky 
ends, such as those generated by restriction enzyme diges- 
tion. E. coli DNA ligase can be used to catalyze the forma- 
tion of phosphodiester bonds between the termini of duplex 
DNA molecules containing cohesive ends. 

Cloning can be carried out in prokaryotic or eukaryotic 
cells. The host for replicating the cloning vehicle will of 
course be one that is compatible with th4 vehicle and in 
which the vehicle can replicate. When a plasmid is employed, 
the plasmid can be derived from bacteria or some other organ- 
ism or the plasmid can be synthetically prepared. The plas- 
mid can replicate independently of the host cell chromosome 
or an integrative plasmid (episome) can be employed. The 
plasmid can make use of the DNA replicative enzymes of the 
host cell in order to replicate or the plasmid can carry 
genes that code for the enzymes required for plasmid replica- 
tion. A number of different plasmids can be employed in 
practicing this invention. 

The DNA sequence of the invention encoding the enzyme I- 
Scel can also be ligated to a vehicle to form an expression 
vector. The vehicle employed in this case is one in which it 
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is possible to express the gene operatively linked to a pro- 
moter in an appropriate host cell. It is preferable to em- 
ploy a vehicle known for use in expressing genes in E. coll, 
yeast, or mammalian cells. These vehicles include, for ex- 
ample, the following E. coli expression vectors: 
pSCM525, which is an E. coli expression vector derived from 
pUCl2 by insertion of a tac promoter and the synthetic 
gene for I-Scel. Expression is induced by IPTG. 
pGEXwS, which is an E. coli expression vector derived from 

pGEX in which the synthetic gene from pSCMS25 for I-Scel 
is fused with the glutathione S transferase gene, 
producing a hybrid protein. The hybrid protein pos- 
f sesses the endonuclease activity. 

pDIC73, which is an E. coli expression vector derived from 
l'^ pET-3C by insertion of the synthetic gene for I-Scel 

(i^del - BamHI fragment of pSCM525) under T7 promoter 
control. This vector is used in strain BL21 (DE3) which 
expresses the T7 RNA polymerase under IPTG induction. 
pSCM351, which is an E. coli expression vector derived from 
pUR291 in which the synthetic gene for I-Scel is fused 
with the Lac Z gene, producing a hybrid protein. 
pSCM353, which is an E. coli expression vector derived' from 
pEXl in which the synthetic gene for I-Scel is fused 
with the Cro/Lac Z gene, producing a hybrid protein. 
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- Examples of yeast expression vectors are: 
pPEX7, which is a yeast expression vector derived from 

pRP51-Bam 0 (a LEU2d derivative of pLG-SDS) by insertion 
of the synthetic gene under the control of the galactose 
promoter. Expression is induced by galactose. 
PPEX408, which is a yeast expression vector derived from 
pLG-SDS by insertion of the synthetic gene under the 
control of the galactose promoter. Expression is in- 
duced by galactose. 
Several yeast expression vectors are depicted in Fig. 7. 

Typical mammalian expression vectors are: 
pRSV l-5cel, which is a pRSV derivative in which the 

synthetic gene (SamHI - PstI fragment from pSCM525) is 
under the control of the LTR promoter of Rous Sarcoma 
Virus. This expression vector is depicted in Fig. 8. 
Vectors for expression in Chinese Hamster Ovary (CHO) cells 
can also be employed. 

5. Cells Transformed with Vprtors of the Invgntinn 

The vectors of the invention can be inserted into host 
organisms using conventional techniques. For example, the 
vectors can be inserted by transformation, transfection, 
elect roporat ion, microinjection, or by means of liposomes 
(lipofection) . 

Cloning can be carried out in prokaryotic or eukaryotic 
cells. The host for replicating the cloning vehicle will of 
course be one that is compatible with the vehicle and in 




- 30 - 



which the vehicle can replicate. Cloning is preferably car- 
ried out in bacterial or yeast cells, although cells of fun- 
gal, animal, and plant origin can also be employed. The pre- 
ferred host cells for conducting cloning work are bacterial 
cells, such as E. coli. The use of E. coli cells is par- 
ticularly preferred because most cloning vehicles, such as 
bacterial plasmids and bacteriophages, replicate in these 
cells. 

In a preferred embodiment of this invention, an expres- 
sion vector containing the DNA sequence encoding the nucle- 
otide sequence of the invention operatively linked to a pro- 
moter is inserted into a mammalian cell using conventional 
techniques. 

Application of I-gcel for laroe scale map ping 
1- Occurrence of n atural sites in various genomes 

Using the purified I-Scel enzyme, the occurrence of 
natural or degenerate sites has been examined on the complete 
genomes of several species. No natural site was found in 
Saccharomyces cerevisiae. Bacillus anthracls, Borrelia 
burgdorferi, Leptospira biflexa and L. interrogans. One de- 
generate site was found on T7 phage DNA. 

2 . Insertion of artificial sites 

Given the absence of natural I-Scel sites, artificial 
sites can be introduced by transformation or transf ection. 
Two cases need to be distinguished: site-directed integra- 
tion by homologous recombination and random integration by 
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nort- homologous recombination, transposon movement or 
retroviral infection. The first is easy in the case of yeast 
and a few bacterial species, more difficult for higher eu- 
caryotes. The second is possible in all systems. 

3 . Insertion vectors 

Two types can be distinguished: 

-1- Site specific cassettes that introduce the J-Scel 
site together with a selectable marker. 

For yeast: all are pAFlOO derivatives (Thierry et al. (1990) 

YEAST 6:521-534) containing the following marker genes: 

pAFlOl: URA3 (inserted in the Hindlll site) 
p 

pAF103: Neo (inserted in BglXl site) 

pAF104: HIS3 (inserted in flglll site) 

pAFlOS: Kan^ (inserted in Bglll site) 



5— 



pAF106: Kan {inserted in Bglll site) 
pAF107: LYS2 (inserted between Hindlll and EcoR V) 
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A restriction map of the plasmid pAFlOO is shown in Fig. 9. 
The nucleotide sequence and restriction sites of regions of 
plasmid pAFlOO are shown in Figs. lOA and lOB. 
Many transgenic yeast strains with the I-Scel site at various 
and known places along chromosomes are available. 

-2- Vectors derived from transposable elements or 
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For coli and other bacteria: mini TnS derivatives con- 
taining the I-5ceI site and 
pTSm a> Str^ 

pTKm « Kan^ (gee Fig. ii) 

pTTc u Tet^ 



For yeast: pTyw6 is a pD123 derivative in which the I-Scel 
site has been inserted in the LTR of the Ty element. 
(Fig. 12) 



For mammalian cells: 
2 PMLV LTR SAPLZ: containing the I-5ceI site in the LTR of MLV 

and Phleo-LacZ (Fig. 13) . This vector is first grown in ^2 
cells (3T3 derivative, from R. Mulligan) . Two transgenic 
cell lines with the I-5ceI site at undetermined locations in 
the genome are available: 1009 (pluripotent nerve cells, 
J.F. Nicolas) and D3 (ES cells able to generate transgenic 
I'* animals) . 
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^- The nested chrom oaomal fragmentation strat^ pgy 

The nested chromosomal fragmentation strategy for 
genetically mapping a eukaryotic genome exploits the unique 
properties of the restriction endonuclease J-Scel, such as an 
18 bp long recognition site. The absence of natural I-5ceI 
recognition sites in most eukaryotic genomes is also 
exploited in this mapping strategy. 

First, one or more I-5ceI recognition sites are 
artificially inserted at various positions in a genome, by 
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homologous recombination using specific cassettes containing 
selectable markers or by random insertion, as discussed 
saera. 'The genome of the resulting transgenic strain is then 
cleaved completely at the artificially inserted I-5ceI 
site(s) upon incubation with the I-Scel restriction enzyme. 
The cleavage produces nested chromosomal fragments. 

The chromosomal fragments are then purified and 
separated by pulsed field gel (PFG) electrophoresis, allowing 
one to "map" the position of the inserted site in the 
chromosome. If total DNA is cleaved with the restriction 
I enzyme, each artificially introduced I-5ceI site provides a 

m ■ . ■ 

y unique "molecular milestone" in the genome. Thus, a set of 

IP • . • ■ 

1^ transgenic strains, each carrying a single I-5ceI site, can 

fH 

i ^® created which defines physical genomic intervals between 

P the milestones. Consequently, an entire genome, a chromosome 

.s 

P or any segment of interest can be mapped using artificially 

introduced I-Scel restriction sites. 

The nested chromosomal fragments may be transferred to a 
solid membrane and hybridized to a labelled probe containing 
DNA complementary to the DNA of the fragments. Based on the 
hybridization banding patterns that are observed, the 
eukaryotic genome may be mapped. The set of transgenic 
strains with appropriate "milestones" is used as a reference 
to map any new gene or clone by direct hybridization. 
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Ej^ainple 1 ; Application of the Nested Chromosornal 
Fragmentation Strategy to the Mapping of Yeast Chromosome XI 

This strategy has been applied to the mapping of yeast 

chromosome XI of Saccharamyces cerevisiae^ The I-5ceI site 

was inserted at 7 different locations along chromosome XI of 

the diploid strain FY1679, hence defining eight physical 

intervals in that chromosome. Sites were inserted from a 

URA3-1-I-Scel cassette by homologous recombination. Two 

sites were inserted within genetically defined genes, TIFl 

and FASl, the others were inserted at un3cnown positions in 

the chromosome from five non- overlapping cosmids of our 

library, taken at random. Agarose embedded DNA of each of 

the seven transgenic strains was then digested with I-Scel 

and analyzed by pulsed field gel electrophoresis (Fig. 14A) . 

The position of the I-Scel site of each transgenic strain in 

chromosome XI is first deduced from the fragment sizes 

without consideration of the left/right orientation of the 

fragments. Orientation was determined as follows. The most 

telomere proximal I-5ceI site from this set of strains is in 

the transgenic E40 because the 50 kb fragment is the shortest 

of all fragments (Fig. 15A) . Therefore, the cosmid clone 

PUKGO40, which was used to insert the I-Scel site in the 

transgenic E40, is now used as a probe against all chromosome 

fragments (Fig. 14B) . As expected, pUKG040 lights up the two 

fragments from strain E40 (50 kb and 630 kb, respectively) . 

The large fragment is close to the entire chromosome XI and 

shows a weak hybridization signal due to the fact that the 

insert of pUKG040, which is 38 kb long, contains less than 4 
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kb within the large chromosome fragment. Note that the 
entire chromosome XI remains visible after I-5ceI digestion, 
due to the fact that the transgenic strains are diploids in 
which the I-Scel site is inserted in only one of the two 
homologs. Now, the pUKG040 probe hybridizes to only one 
fragment of all other transgenic strains allowing unambiguous 
left/right orientation of I-Scel sites (See Fig. 15B) . No 
significant cross hybridization between the cosmid vector and 
the chromosome subfragment containing the I-Scel site 
insertion vector is visible. Transgenic strains can now be 
ordered such that I-Scel sites are located at increasing 
distances from the hybridizing end of the chromosome 
(Fig. 15C) and the I-Scel map can be deduced (Fig. 15D) . 
Precision of the mapping depends upon PFGE resolution and 
optimal calibration. Note that actual left/right orientation 
of the chromosome with respect to the genetic map is not 
known at this step. To help visualize our strategy and to 
obtain more precise measurements of the interval sizes 
between I-Scel sites between I-Scel, a new pulsed field gel 
electrophoresis with the same transgenic strains now placed 
in order was made (Fig. 16). After transfer, the fragments 
were hybridized successively with cosmids pUKG040 and pUKG066 
which light up, respectively, all fragments from the opposite 
ends of the chromosome (clone pUKG066 defines the right end 
of the chromosome as defined from the genetic map because it 
contains the SIRl gene. A regular stepwise progression of 
chromosome fragment sizes is observed. Note some cross 
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hybridization between the probe pUKG066 and chromosome III, 
probably due to some repetitive DNA sequences. 

All chromosome fragments, taken together, now define 
physical intervals as indicated in Fig. isd. The I-5ceI map 
obtained has an 80 kb average resolution. 
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Example 2; Application of the Nested Chromosomal 
Fragmentation Strategy to the Mapping of Yeast Artificial 
Chromosome (YAC) Clones 

This strategy can be applied to YAC mapping with two 
possibilities. 

-1- insertion of the I-Scel site within the gene of 
interest using homologous recombination in yeast. This per- 
mits mapping of that gene in the YAC insert by I-Scel diges- 
tion in vitro. This has been done and works. 

-2- random integration of I-Scel sites along the YAC 
insert by homologous recombination in yeast using highly re- 
petitive sequences (e.g., B2 in mouse or Alu in human). 
Transgenic strains are then used as described in ref. PI to 
sort libraries or map genes. 

The procedure has now been extended to YAC containing 
450 kb of Mouse DNA. To this end, a repeated sequence of 
mouse DNA (called B2) has been inserted in a plasmid 
containing the I-Scel site and a selectable yeast marker 
(LYS2) . Transformation of the yeast cells containing the 
recombinant YAC with the plasmid linearized within the B2 
sequence resulted in the integration of the I-5ceI site at 
five different locations distributed along the mouse DNA 
insert. Cleavage at the inserted I-5ceI sites using the 
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enzyme has been successful, producing nested fragments that 
can be purified after electrophoresis. Subsequent steps of 
the protocol exactly parallels the procedure described in 
Example 1. 

Elxample 3 ; Application of Nested Chromosomal Fragments to the 
Direct Sorting of Cosmid Libraries 

The nested, chromosomal fragments can be purified from 

preparative PFG and used as probes against clones from a 

chromosome XI specific sublibrary. This sublibrary is 

composed of 138 cosmid clones (corresponding to eight times 

coverage) which have been previously sorted from our complete 

yeast genomic libraries by colony hybridization with PFG 

purified chromosome XI. This collection of unordered clones 

has been sequentially hybridized with chromosome fragments 

taken in order of increasing sizes from the left end of the 

chromosome. Localization of each cosmid clone on the I-5ceI 

map could be unambiguously determined from such 

hybridizations. To further verify the results and to provide 

a more precise map, a subset of all cosmid clones, now 

placed in order, have been digested with EcoRI, 

electrophoresed and hybridized with the nested series of 

chromosome fragments in order of increasing sizes from the 

left end of the chromosome. Results are given in Figure 17. 

For a given probe, two cases can be distinguished: 

cosmid clones in which all EcoRI fragments hybridize with the 

probe and cosmid clones in which only some of the EcoRI 

fragments hybridize (i.e., compare pEKGlOO to pEKG098 in Fig. 
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17b) . The first category corresponds to clones in which the 
insert is entirely included in one of the two chromosome 
fragments, the second to clones in which the insert overlaps 
an l-5cel site. Note that, for clones of the pEKG series, 
the EcoRI fragment of 8 3cb is entirely composed of vector 
sequences (pWElS) that do not hybridize with the chromosome 
fragments. In the case where the chromosome fragment 
possesses the integration vector, a weak cross hybridization 
with the cosmid is observed (Fig. I7e) . 

Examination of Fig. 17 shows that the cosmid clones can 
unambiguously be ordered with respect to the I-5ceI map 
(Fig. 13E), each clone falling either in a defined interval 
or across an I-Scel site. In addition; clones from the 
second category allow us to place some EcoRI fragments on the 
I-5ceI maps, while others remain unordered. The complete set 
of chromosome XI- specific cosmid clones, covering altogether 
eight times the equivalent of the chromosome, has been sorted 
with respect to the I-5ceJ map, as shown in Fig. 18. 

5 • Partial restri ction mapping using I-fSrt^ T 

In this embodiment, complete digestion of the DNA at the 
artificially inserted I-5ceI site is followed by partial 
digestion with bacterial restriction endonucleases of choice. 
The restriction fragments are then separated by 
electrophoresis and blotted. Indirect end labelling is 
accomplished using left or right I- See half sites. This 
technique has been successful with yeast chromosomes and 
should be applicable without difficulty for YAC 
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Partial restriction mapping has been done on yeast DNA 
and on mammalian cell DNA using the commercial enzyme I-Scel. 
DNA from cells containing an artificially inserted I-Scel 
site is first cleaved to completion by I-Scel. The DNA is 
then treated under partial cleavage conditions with bacterial 
restriction endonucleases of interest {e.g., BamHI) and 
electrophoresed along with size calibration markers. The DNA 
is transferred to a membrane and hybridized successively us- 
ing the short sequences flanking the I-5ceI sites on either 
side (these sequences are known because they are part of the 
original insertion vector that was used to introduce the 
I-Scel site) . Autoradiography (or other equivalent detection 
system using non radioactive probes) permit the visualization 
of ladders, which directly represent the succession of the 
bacterial restriction endonuclease sites from the I-Scel 
site. The size of each band of the ladder is used to 
calculate the physical distance between the successive 
bacterial restriction endonuclease sites. 

Application of I-Scel for In Vivo 
Site D irected Recombination 

1- Expression of T-Scel in y pa.gr 

The synthetic I-Scel gene has been placed under the 
control of a galactose inducible promoter on multicopy 
plasmids pPEX7 and pPEX408. Expression is correct and 
induces effects on site as indicated below. A transgenic 
yeast with the I-Scel synthetic gene inserted in a chromosome 
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under the control of an inducible promoter can be 
constructed. 

2 • Effects of site specific double strand breaks in 
veast fre fs. 18 and P4) 

Effects on plasmid-bome I-Scel sites: 

Intramolecular effects are described in detail in 

Ref. 18. Intermolecular (plasmid to chromosome) 

recombination can be predicted. 

Effect s on chromosome integrated I-Scel sites 
In a haploid cell,, a single break within a chromosome at 
an artificial I-Scel site results in cell division arrest 
followed by death (only a few % of survival) . Presence of an 
intact sequence homologous to the cut site results in repair 



P and 100% cell survival. In a diploid cell, a single break 

within a chromosome at an artificial I-Scel site results in 



0 repair using the chromosome homolog and 100% cell survival 

In both cases, repair of the induced double strand break re- 
sults in loss of heterozygosity with deletion of the non ho- 
mologous sequences flanking the cut and insertion of the non 
homologous sequences from the donor DNA molecule. 

3 • Application for in vivo recombination YACs in Yeast 

Construction of a YAC vector with the I-Scel restriction 
...^o-.c=s site next to the cloning site should permit one to induce 
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partially overlapping. This is useful for the construction 
of contigs. 
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4. Prospects for other oroanismg 

Insertion of an I-Scel restriction site has been done 
for bacteria (JE. coli, Yersinia entorocolitica, Y. pestis, Y. 
pseudotuberculosis) , and mouse cells. Cleavage at the 
artificial I-Scel site in vitro has been successful with DNA 
from the transgenic mouse cells. Expression of I-Scel from 
the synthetic gene in mammalian or plant cells should be 
successful . 

The I'Scel site has been introduced in mouse cells and 
bacterial cells as follows: 
-1- Mouse cells: 

-a- Mouse cells {}p2) were transfected with the DNA 
of the vector pMLV LTR SAPLZ containing the I-5ceI site using 
standard calcium phosphate transfection technique. 

-b- Transfected cells were selected in DMEM medium 
containing phleomycin with 5% fetal calf serum and grown un- 
der 12% CO^, 100% humidity at 31^C until they form colonies. 

-c- Phleomycin resistant colonies were subcloned 
once in the same medium. 

-d- Clone MLOP014, which gave a titer of 10^ virus 
particles per ml, was chosen. This clone was deposited at 
C.N. CM. on May 5, 1992 under culture collection accession 
No. 1-1207. 
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-e- The supernatant of this clone was used to infect 
mouse cells (1009) b'^^spreading 10^ virus particles on 
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10^- cells in DMEM mediuin\ with 10% fetal calf serum and 5 mg/ 
ml of "polybrain". Medid| was replaced 6 hours after 
infection by the same fresH medium. 

-f- 24 hours after infection, phleomycin resistant 
cells were selected in the same medium as above, 

-g- phleomycin resistant colonies were subcloned 
once in the same medium. 

-h- one clone was picked and analyzed. DNA was 
purified with standard procedures and digested with I-5ceI 
under optimal conditions. 
-2- Bacterial cells: 

Mini Tn 5 transposons containing the I-Scel 
recognition site were constructed in E. coll by standard 
recombinant DNA procedures. The mini Tn 5 transposons are 
carried on a conjugative plasmid. Bacterial conjugation 
between E, coli and Yersinia is used to integrate the mini Tn 
5 transposon in Yersinia. Yersinia cells resistant to 
Kanamycin, Streptomycin or tetracycline are selected (vectors 
pTKM-co, pTSM-o) and pTTc-o), respectively) . 

Several strategies can be attempted for the site spe- 
cific insertion of a DNA fragment from a plasmid into a chro- 
mosome. This will make it possible to insert transgenes at 
predetermined sites without laborious screening steps. 
Strategies are: 

-1- Construction of a transgenic cell in which the 
I-5ceI recognition site is inserted at a unique location in a 
chromosome. Cotransf ormation of the transgenic cell with the 
expression vector and a plasmid containing the gene of 
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interest and a segment homologous to the sequence in which 

the I-Scel site is inserted. 

-2-- Insertion of the I-5ceI recognition site next to or 

within the gene of interest carried on a plasmid. 

Cotransformation of a normal cell with the expression vector 

carrying the synthetic I-5ceI gene and the plasmid containing 

the 1-Scel recognition site. 

-3- Construction of a stable transgenic cell line in 
which the l-5cel gene has been integrated in the genome under 
the control of an inducible or constitutive cellular pro- 
moter. Transformation of the cell line by a plasmid contain- 
ing the l-5cel site next to or within the gene of interest. 

Site directed homolQQQu q recombination : diagrams of 
successful experiments performed in yeast are given in 
Fig. 19. 
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